|
Text semantic classification algorithm based on risk decision
CHENG Yusheng, LIANG Hui, WANG Yibin, LI Kang
Journal of Computer Applications
2016, 36 (11):
2963-2968.
DOI: 10.11772/j.issn.1001-9081.2016.11.2963
Most of traditional text classification algorithms are based on vector space model and hierarchical classification tree model is used for statistical analysis. The model mostly doesn't combine with the semantic information of characteristic items. Therefore it may produce a large number of frequent semantic modes and increase the paths of classification. Combining with the good distinguishment characteristic of essential Emerging Pattern (eEP) in the classification and the model of rough set based on minimum expected risk decision, a Text Semantic Classification algorithm with Threshold Optimization (TSCTO) was presented. Firstly, after obtaining the document feature frequency distribution table, the minimum threshold value was calculated by the rough set combined with distribution density matrix. Then the high frequency words of the semantic intra-class document frequency are obtained by combining semantic analysis and inverse document frequency method. In order to get the simplest model, the eEP pattern was used for classification. Finally, using similarity formula and HowNet semantic relevance degree, the score of text similarity was calculated, and some thresholds were optimized by the three-way decision theory. The experimental results show that the TSCTO algorithm has a certain improvement in the performance of text classification.
Reference |
Related Articles |
Metrics
|
|